Skip to content

BUG: Fix TypeError caused by GH13374 #17465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 10, 2017
Merged

BUG: Fix TypeError caused by GH13374 #17465

merged 10 commits into from
Sep 10, 2017

Conversation

matthax
Copy link
Contributor

@matthax matthax commented Sep 7, 2017

  • closes BUG: Python parser breaks with quotes and multi-char sep  #13374
  • 0 failed, 9873 passed, 1955 skipped, 11 xfailed, 4 warnings in 1475.44 seconds

    added test_none_delimiter in pandas/tests/io/parser/python_parser_only.py

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

    Bug in :func:read_csv where automatic delimiter detection caused a TypeError to be thrown when a bad line was encountered rather than the correct error message (:issue:13374)

@gfyoung

@pep8speaks
Copy link

pep8speaks commented Sep 7, 2017

Hello @matthax! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 09, 2017 at 21:49 Hours UTC

@gfyoung gfyoung added Bug IO CSV read_csv, to_csv labels Sep 7, 2017
@codecov
Copy link

codecov bot commented Sep 7, 2017

Codecov Report

Merging #17465 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17465      +/-   ##
==========================================
- Coverage   91.16%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49590    49590              
==========================================
- Hits        45209    45200       -9     
- Misses       4381     4390       +9
Flag Coverage Δ
#multiple 88.93% <100%> (ø) ⬆️
#single 40.25% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.46% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee6185e...ced6fc6. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 7, 2017

Codecov Report

Merging #17465 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17465      +/-   ##
==========================================
- Coverage   91.15%   91.13%   -0.02%     
==========================================
  Files         163      163              
  Lines       49534    49534              
==========================================
- Hits        45153    45144       -9     
- Misses       4381     4390       +9
Flag Coverage Δ
#multiple 88.92% <100%> (ø) ⬆️
#single 40.22% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.46% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdbc6b8...f151405. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2017

pls add your example: #13374 (comment) as a test.

@@ -411,6 +411,7 @@ I/O
- Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
- Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
- Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)
- Bug in :func:`read_csv` where automatic delimiter detection caused a `TypeError` to be thrown when a bad line was encountered (:issue:`13374`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-back-ticks around TypeError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add:

rather than the correct error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated both. Looks like you're going to have to manually restart the travis build though, something went wrong there and I'm not able to see any log info.

@matthax
Copy link
Contributor Author

matthax commented Sep 8, 2017

@jreback added the test and made the requested changes in whatsnew

@@ -218,6 +218,23 @@ def test_multi_char_sep_quotes(self):
self.read_csv(StringIO(data), sep=',,',
quoting=csv.QUOTE_NONE)

def test_none_delimiter(self):
# see gh-13374 and gh-17465
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a 1-line comment about what is happening here.

Is this only in the python parser as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the Python parser. I actually discovered the issue because I was using the built in CSV sniffer and the C parser for a while, but switched to the python engine with the pandas sniffer because it did notably better for the data files I was using.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2017

@gfyoung pls have a look.

Copy link
Contributor Author

@matthax matthax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the documentation for the test method. Couldn't find a nice way to one-line it though

@jreback jreback added this to the 0.21.0 milestone Sep 9, 2017
@jreback
Copy link
Contributor

jreback commented Sep 9, 2017

lgtm. @gfyoung pls review and merge.


# We expect the third line in the data to be
# skipped because it is malformed
# but we do not expect any errors to occur
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: add a comma after "malformed" + add a period at end of comment.

@@ -2836,7 +2836,8 @@ def _rows_to_cols(self, content):
for row_num, actual_len in bad_lines:
msg = ('Expected %d fields in line %d, saw %d' %
(col_len, row_num + 1, actual_len))
if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
if self.delimiter and \
len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
Copy link
Member

@gfyoung gfyoung Sep 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally practice for is to use parentheses around the conditionals and not to use the slash for something a little nicer to read i.e.:

if (self.delimiter and
    len(self.delimiter) > 1...)

@gfyoung gfyoung self-assigned this Sep 9, 2017
@matthax
Copy link
Contributor Author

matthax commented Sep 9, 2017

Just and FYI the flake check doesn't seem to be working on windows. I tried altering the commands as suggested but no dice. It doesn't error but it doesn't show me any issues. I swapped to using pylint and that seems to work ok, but perhaps that's something that needs to be looked into.

edit running it through the git-bash shows nothing either
image

@gfyoung
Copy link
Member

gfyoung commented Sep 9, 2017

Just and FYI the flake check doesn't seem to be working on windows.

Hmmm...we've been having this issue off-and-on with Windows. Not sure why it wouldn't have been caught on the diff.

@gfyoung gfyoung removed their assignment Sep 10, 2017
@gfyoung gfyoung merged commit 23050dc into pandas-dev:master Sep 10, 2017
@gfyoung
Copy link
Member

gfyoung commented Sep 10, 2017

Thanks @matthax !

@matthax matthax deleted the python_parser_patch branch September 10, 2017 12:48
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Python parser breaks with quotes and multi-char sep
4 participants